13 May, 2020
h2.title { font-size: 8px; #color: #a9a9a9; text-align: center; }
Introduction
Data set:
Goal:
Material and Methods
Source: Kaggle
Material and Methods

Material and Methods


Material and Methods
Project’s GitHub repository

Material and Methods
Course-related packages
{tidyverse}
- dplyr
- tidyr
- ggplot2
- broom
- stringr
- tibble
{other}
- keras/tensorflow
- markdown
- knitr
- shiny
- patchwork
- GitHub
Course-unrelated packages
Results — no outliers on total protein expression

Results — breast cancer classes in the dataset are well represented

Results — breast cancer classes do not discriminate on age

Results — breast cancer and gender

Results — protein expression heatmap

Results — dimensionality reduction

Results — K-means clustering

Results — ANN model’s structure

Results — ANN performance

Discussion
- K-means clustering Acc.: 72.7% - ANN model Acc.: 82.8%
- Collect more data for building more reliable models
- Combine proteome data with RNAseq data to investigate more associations - network analysis
- Tidyverse R package is a smart and elegant tool for data analysis and visualization
Thank you for your attention
Shiny App